Transliterating non-ASCII characters with Python

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ASCII Escaping of Unicode Characters

There are a number of circumstances in which an escape mechanism is needed in conjunction with a protocol to encode characters that cannot be represented or transmitted directly. With ASCII coding, the traditional escape has been either the decimal or hexadecimal numeric value of the character, written in a variety of different ways. The move to Unicode, where characters occupy two or more octe...

متن کامل

Using Lexical tools to convert Unicode characters to ASCII.

Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in most of the worlds writing systems. It is widely used in multilingual NLP (natural language processing) projects. On the other hand, there are some NLP projects still only dealing with ASCII characters. This paper describes methods of utilizing lexical tools to convert Unicode character...

متن کامل

Transliterating From All Languages

Much of the previous work on transliteration has depended on resources and attributes specific to particular language pairs. In this work, rather than focus on a single language pair, we create robust models for transliterating from all languages in a large, diverse set to English. We create training data for 150 languages by mining name pairs from Wikipedia. We train 13 systems and analyze the...

متن کامل

Ensuring the Security of Text-based Information Transmission by Utilizing Invisible ASCII Characters

The universal storage and transmission of information by text-based documents makes information hiding technology an important topic in computer security. Currently, the popularly used text-based information techniques are facing many problems, such as poor robustness, low embedding rate, semantic clutter and can be easily distinguished by unaided eyes. In order to solve these problems, this pa...

متن کامل

HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters

Most existing computer systems which can handle a text file of arbitrarily mixed Chinese and ASCII characters use 8-bit codes. To exchange such text files through electronic mail on ASCII computer systems, it is necessary to encode them in a 7-bit format. A generic binary to ASCII encoder is not sufficient, because there is currently no universal standard for such 8-bit codes. For example, CCDO...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Programming Historian

سال: 2013

ISSN: 2397-2068

DOI: 10.46430/phen0032